Chinese Discourse Segmentation Based on Punctuation Marks
نویسندگان
چکیده
This paper addresses Chinese discourse segmentation based on punctuation mark. Particularly, we propose various kinds of lexical, syntactic, position and punctuation features to train classifiers for Chinese discourse segmentation. Experimental results on CDTB (Chinese Discourse Treebank) show that our method based on punctuation mark is appropriate for Chinese discourse segmentation with 89.2% in accuracy.
منابع مشابه
Discursive Usage of Six Chinese Punctuation Marks
Both rhetorical structure and punctuation have been helpful in discourse processing. Based on a corpus annotation project, this paper reports the discursive usage of 6 Chinese punctuation marks in news commentary texts: Colon, Dash, Ellipsis, Exclamation Mark, Question Mark, and Semicolon. The rhetorical patterns of these marks are compared against patterns around cue phrases in general. Result...
متن کاملApplication of Chinese Natural Language Generation in Semantic Web
RDF is the representation of the Semantic Web. When querying RDF documents, the result is a sub-graph of RDF data model or a number of triple statements. In this paper, we apply natural language generation technique to render the result into multi-sentential text for human comprehension. We investigate the effect of discourse segmentation on the generation of anaphora and punctuation marks in C...
متن کاملClause-based Discourse Segmentation of Arabic Texts
This paper describes a rule-based approach to segment Arabic texts into clauses. Our method relies on an extensive analysis of a large set of lexical cues as well as punctuation marks. Our analysis was carried out on two different corpus genres: news articles and elementary school textbooks. We propose a three steps segmentation algorithm: first by using only punctuation marks, then by relying ...
متن کاملInformation-based Aspects of Punctuation
We ooer a preliminary account of the information-based aspects of punctuation marks. We give our initial treatment within the Discourse Representation Theory and its segmented version. We hypothesize that this work will be useful in classifying the informational contributions of punctuation marks and bringing them to bear on the semantic characterization of written discourse.
متن کاملOn Generalized-Topic-Based Chinese Discourse Structure
Song Rou Jiang Yuru Wang Jingyi Beijing Language and Culture University Beijing University of Polytechnic Technology Beijing Forest University Beijing University of Information Science and technology Abstract: Due to the lack of external formal marks, components in Chinese discourse can hardly be categorized into the traditional syntactic system. In fact, Chinese is a typical topic-prominent la...
متن کامل